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Abstract 

We obtain an asymptotic expansion for the null distribution function of the gradient statistic 
for testing composite null hypotheses in the presence of nuisance parameters The expansion is de- 
rived u sing a Bayesian route based on the shrinkage argument described in lGhosh and Mukerjee 
(119911) . Using this expansion, we propose a Bartlett-type corrected gradient statistic with chi- 
square distribution up to an error of order o(n~ 1 ) under the null hypothesis. Further, we also use 
the expansion to modify the percentage points of the large sample reference chi-square distribu- 
tion. A small Monte Carlo experiment and various examples are presented and discussed. 



Key-words: Asymptotic expansion; Bartlett-type collection; Bayesian route; Gradient statistic; 
Shrinkage argument. 



1 Introduction 



he most com mon hypothesis tests for 



arge sa mples are the likelihood ratio (|Wilkslli938|) . the Wald 



(|Waldl . 1 19431) . and the Rao score (|RaoL Il948|) tests. These tests are widely used in areas such as 



economics, biology, and engineering, among others, since exa ct tests are no t always available. An 



Terrell 



(120021) . An advantage of the 



alternative test uses the gradient statistic recently proposed by 
gradient statistic over the Wald and the score statistics is that it does not involve knowledge of the 
information matrix, neither expected nor observed. Addi tionally, the gradient statistic is quite simple 
to be computed. This has been emphasised by C.R. Rao ( Rao . 12005 ). who wrote: 'The suggestion by 
Terrell is attractive as it is simple to compute. It would be of interest to investigate the performance 
of the [gradient] statistic'. 

Let xi, . . . ,x n be a random sample of size n with joint probability density function /(•;#), 
which depends on a p-dimensional vector of unknown parameters 6 = (#!,..., 9 n ) T . Let £(0) 



5 "pj 



n 1 Y^i=\ \°gf( x u 0) an d U(0) = d£(0)/d0 be the log-likelihood function and the score vector, 
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respectively; notice that, for convenience, both are divided by n. We wish to test the null hypoth- 
esis 7i : 1 = 0io against the two-sided alternative hypothesis 1-L a : X ^ 1O , where 1O is a 
fixed (/-dimensional vector, 0i = (9\, . . . , 6 q ) T and 2 = (9 q +i, ■ ■ ■ ,0 P ) T . The partition in 6 in- 
duces the corresponding partition in 17(0): 17(0) = (C7 1 (0) T ,t7 2 (0) T ) T - Let = (0i,0 2 ) T and 








(010; G 2 ) T be the unrestricted and the restricted (under "H ) maximum likelihood estimators of 
(07, 0j) T , respectively. The gradient statistic for testing Ho is defined as 

S = nU(d) T (0 



0), 



(1) 



and can also be written as S = nU\{0) T {6\ — 0io), since £7 2 (0) = 0. Like the likelihood ratio, the 
Wald, and the score statistics, the gradient statistic has an asymptotic x\ distribution under the null 
hypothesis, q being the number of restrictions imposed by Ho- 

Equation (OQ) is the inner product of the score vector evaluated at H and the difference between 
the unrestricted an d the res t ricted maximum likelihood estimators of 0. Although the gradient statis- 



tic was derived by iTerrelll (|2002|) from the score and the Wald statistics, it is of a different nature. 
The score statistic measures the squared length of the score vector evaluated at Ho using the metric 
given by the inverse of the Fisher information matrix, whereas the Wald statistic gives the squared 
distance between the unrestricted and the restricted maximum likelihood estimators of using the 
metric given by the Fisher information matrix. Moreover, both are quadratic forms. The gradient 
statistic, on the other hand, is not a quadratic form and measures the distance between the unrestricted 
and the restricted maximum likelihood estimators of from a different perspective. It measures the 
orthogonal projection of the score vector at f-L Q on the vector — 0. 



Rec ently, the gradient test has been the subject of some research papers. In particular. Lemonte and Ferrari 



(|2012ah obtained the local power of the gradient test under Pitman alternatives (a sequence of alter- 
native hypotheses converging to the null hypothesis at the rate of n -1 / 2 ). The authors compared the 
local power of the gradient test with those of the likelihood ratio, the Wald, and the score tests. They 
showed that none of the tests is uniformly more powerful than the others, and therefore, the gradient 
test is not only very simple to be calculated but it is also competitive with the other s in terms of local 
power. Other recent works in whic h the gradient test is investigated are 



Lemonte (2011 



2012) and 



Lemonte and Ferrari 



(2011 



The main result in 



2012bllc). 



Lemonte and Ferrari 



(I2012al) regarding the local power of the gradient test up 
to an error of order o(n -1 / 2 ) represents the first step in the study of higher order asymptotic properties 
of the gradient test. In the present paper, we wish to go further by focusing on deriving the second- 
order approximation to the null distribution of the gradient statistic. In other words, our aim is to 
obtain an asymptotic expansion for the cumulative distribution function of the gradient statistic under 
the null hypothesis up to an error of order o(n^ r ). 

The usual route for deriving expansions for the distribution of asymptotic chi-square test statistics 
involves multivariate Edgeworth series expansions. Although suc h a route has been followed by 
many authors, it is extremely lengthy and tedious (see, for example. lHayakaw all 19771 : iHarrisL Il985|) . 



2 



Here, on the other hand, in order to derive an asymptotic expansion for the null distribution of the 
gradient stati stic up to order n" 1 , we follo w a Bayesian route ba sed on a shrinkage argumen t originally 
suggested by lGhosh and Mukerjeel(|l99l|) and described later in lMukerjee and Reidl (|2000|) . Although 
it uses a Bayesian approach, this technique can be us ed to solve frequent i st pro blems, such as the 
derivation of Bartlett corrections and tail probabilities (IDatta and Mukerj eel 120031) . 

Additionally, we ob t ain a B artlett-type correction factor for the gradient statistic from the results 



in 



Cordeiro and Ferrari 



(|1991|) . Under the null hypothesis, the corrected statistic is distributed as 
chi-square up to an error of order o(ra _1 ), while the uncorrected gradient statistic has a chi-square dis- 
tribution up to an error of order o(n~ l l 2 )\ that is, the Bartlett-type correction factor makes the approx- 
imation error be reduced from ojn - 1 / 2 ) to ojn -1 ). For a detailed surv ey on Bartlett and Bartlett-type 
corrections, the reader is referred to Cordeiro and Cribari -Netol (|1996r) . 

The paper unfolds as follows. In Section [2l we present our main results, namely an asymptotic 
expansion for the cumulative distribution function of the gradient statistic and its Bartlett-type cor- 
rection. In Sections [3] and HI we particularise our general results to one-parameter families and to 
families with two orthogonal parameters, respectively. A small Monte Carlo study is also presented 
in SectionHl Section[5]closes the paper with a brief discussion. Technical details are collected in two 
appendices. 



2 The main result 



First, let us introduce some notation. Let Dj = d/d9j (j = 1, ... ,p) be the differential operator. 
We define Uj = Dj£(0), U jr = DjD r t{&), U jrs = DjD r D s £(0), and so on. We make the same 
assumptions, such as the regularity of the first four derivatives of £(0) with respect to 6 a nd the exis 
tence and uniqueness of the maximum likelihood estimator of 6, as those fully outlined by 



Havakawa 



(Il977h . Let k 



.)■<' 



K 



jrs,u 



E(Uj rs U u J, K>j u ,rs 



E(UjU r ), Kj r = E(Uj r ), Kj rs = E(Uj rs 



jrsu 



E{Uj rsu ), Kj )TS = E{UjU r 



E{UjU u U rs ) + Kj U K rs , etc., denote 



the cumulants of log-likelihood derivatives. The cumulants are not functionally independent, for in- 



stance, K 



where k 



(1953a 



and niu 



jrs 



0) 



+ K 



■jrsu 



Hi q 



jrsu 



— K 



(r) 
jsu 



~\~ ftsu I5l 



DjD r n su , etc. Relations among them were first obtained by 



Bartlett 



bj). Further, let K be the Fisher information matrix 



K = (fe)) = -((kj>)) 



Kn K12 

K 2 \ K'2'2 



with K 1 = ((ft J ' r )) denoting its inverse. Finally, define the matrices 



A = ((an) 





o k£ 



M = ((m 3r )) = K 



-i 
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In what follows, we use the Einstein summation convention, where Yl' denotes summation over all 
components of 6; that is, the indices j, r, s, k, I and u range over 1 to p. We now establish the 
following theorem. 

Theorem 1. The asymptotic expansion for the null distribution of the gradient statistic for testing 
%q : 0± = 6 W against T-L a : Q\ ^ #io is 



1 3 

Pr(S <x)= G q (x) + ^Y1 WiM*) + ( 2 ) 



i=0 



where G z (x) is the cumulative distribution function of a chi-square random variable with z degrees 
of freedom, R x = 3A 3 - 2A 2 + A lt R 2 = A 2 - 3A 3 , R 3 = A 3 , R = -(R l +R 2 + R 3 ), 

A x = 3 ^ K jrs K klu a lu {3m jk a rs + m ir (K s ' k + 2a sk ) } 

+ 6 Y, K jrs, u m jr a su -&Y1 ( K i rsu + K irs,u) (m jr K s ' u + 2m ju a rs ) 
+ 6 Y^( K kiu + Kkl,u) [2(K jrs + K jr , s ) (n s ' j K r ' k K l ' u - a sj a rk a lu 
+ n s ' k K l ' j K r ' u - a sk a lj a ru ) - n jrs { (k s ' u + a su ) (/^'V' r - a jk a lr ) 
+ m jr (a sk a lu + K s ' k K l ' u ) + 2a rs (K j ' k K l ' u - a jk a lu ) + 2a rk a ls m ju ] 



m sk a lu + ^ m sk m lu + 3m kl a su\ + }_ m jk m rl m s 



A 2 — —3 ^ ] Kj rs K k [ u "I 

- 2{n klu + K M>U ) {m su (K j ' k K l ' r - a jk a lr ) + m jr ( K s ' k K l ' u - a sk a lu ) } 



A 3 = — X/' KjrsK k iu{9m jr 'm sk m lu + 6m jk m rl m su ). 
Proof. The proof is presented in Appendix 1 . □ 

Basically, in order to prove Theorem [H we follow a Bayesian route based on a shrinkage argument. 
This argument is described in Appendix 2. 

If the null hypothesis is simple, we have q = p, A = and M = K. 1 . Therefore, an immediate 
consequence of Theorem \T\ is the following corollary. 

Corollary 1. The asymptotic expansion for the null distribution of the gradient statistic for testing 
H : 6 = against 1-L a : ^ 6 is given by © with q = p, Ri = 3A 3 — 2A 2 + Ai, R 2 = A 2 — 3A 3 , 



4 



R 3 = A 3 , Ro = -(R 1 + R 2 + R 3 ) and the A 's are A 3 = K jrs n k i u (9^' r n s ' k ^ u + 6K j ' k n r ' l K s > u )/12, 

A\ = 6 ^ ^ (fcjrsu + Kjrs,ii)^' K 

+ 6 Y^( K klu + Kfct.u) {2(K jrs + K jTiS ) ( K "' i K r > k K l > U + K s,fc « r,tt ) 

f s,u j,k l,r I i^j^r s t k l t u\\ 
rhjrg \r\j rb re ~\~ rh rv 1% J c 

~h 12 ^ ^ iy^jrsu H~ ^j,rsu H~ ^jsu,r H~ ^ju.rs H~ ^j,u,rs) ^ ^ 5 



-K i > r K'> k K l > u + W'" 



We are now able to present a Bartlett-type corrected gradient statistic. A Bartlett-type correction 
is a multiplying factor, which depends on the statistic itself, that results in a modified statistic that 



Cordeiro and Ferrari 



follow s a chi-square distribution with approximation error of order less than n" 1 . 
(|l991 ) obtained a general formula for a Bartlett-type correction for a wide class of statistics that have 
a chi-square distribution asymptotically. A special case is when the cumulative distribution function 

y of th e coefficients R\, R2, and R 3 . Hence, from 
(119911) . we have the following corollary. 



of the statistic can be written as d2 k independent 
Theorem [Hand the results in 



Cordeiro and Ferrari 



Corollary 2. The modified statistic 



where 



A, 



12ng(g + 2)(g + 4)' 



S* = S{1 - (c + bS + aS 2 )}, 



A 2 - 2A 3 
12nq(q + 2) 



(3) 



M ~ A 2 + A 3 
Ylnq 



has a Xq distribution up to an error of order o(n x ) under the null hypothesis. 

The factor {1 — (c + bS + aS 2 )} in © can be regarded as a Bartlett-type correction factor for the 
gradient statistic in such a way that the null distribution of S* is better approximated by the reference 
X 2 distribution than the distribution of the uncorrected gradient statistic. 

Instead of modifying the test statistic as in d3j ) , we m ay modify the reference x 2 distribution using 
the inverse expansion formula in 



Hill and Davis 



(|1968|) . To be specific, let 7 be the desired level of 
the test, and xi_ 7 be the 1 — 7 percentile of the x 2 limiting distribution of the test statistic. From 
expansion ©, we have the following corollary. 
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Corollary 3. The asymptotic expansion for the 1 — 7 percentile ofS to order n 1 takes the form 

1 



Zt\ — *y X\ — <-y — |" 



-K_ 7 + (9 + 4)* 1 _ 7 + (g + 2)(g + 4)} 



2nLg(g + 2)(g + 4) ^ 

xi_ 7 (xi_ 7 + q + 2) xi 



(A 2 -3A 3 ) + ^(3A 3 -2A 2 + A 



q(q + 2) 
where Pr(Xg > #i- 7 ) = 7- 

In general, equations © and © depend on unknown parameters. In this case, we can replace 
these unknown parameters by their maximum likelihood estimates obtained under Hq. It should be 
noticed that the improved gradient test of the null hypothesis T-L may be performed in three ways: 
(i) by referring the corrected statistic S* in © to the x\ distribution; (ii) by referring the gradient 
statistic S to the approximate cumulative distribution function ©; (iii) by comparing S with the 
modified upper percentile in ©. These three procedures are equivalent to order n -1 . 

Finally, the three moments, up to order n~ l under the null hypothesis, of the gradient statistic are 
presented in the following corollary. 

Corollary 4. The first three moments, up to order n" 1 under the null hypothesis, of the gradient 
statistics are 

fi 1 {S) = q+—, fi 2 {S) = 2q + — — — , 

2(A 1 + 2A 2 + A 3 ) 

toiS) =8q+ . 

n 

In the next sections, we consider some applications of the general results derived in this section in 
two special cases: a one-parameter model and a two-parameter model under orthogonality of param- 
eters. 



3 The one-parameter case 

We initially assume that the model is indexed by a scalar unknown parameter, say 0. The interest 
lies in testing the null hypothesis T-L : <p = O against T-L a : <\> ^ <p , where <p is a fixed value. Let 

k h = E(dH{<p)/d<p 2 ), Km = E{dH{<p)/d^), k hh = E(dH{<t>)/d<t> A ), = dn^/d^M^ = 

dn^/dcf), and K^tf 1 = d^K^/dcj) 2 . The gradient statistic for testing Ho is S = nU(4> )(4> — 4> ), 
where <p is the maximum likelihood estimator of 0. Here, A\, A 2 , and A 3 given in Corollary [T]reduce 
to 

6k^{^2k,, — k^A + 12k — 2k 



A x - -3 , {?) 

A * ~ 773 ' W 

^K±i, 



6 



-4, 



5k 



4k 



(7) 



We now present some examples. 
Example 1. (Exponential distribution) 

Let Xi, . . . , x n be a random sample of an exponential distribution with density 



/O;0) 



-x/4> 



x > 0, 



> 0. 



Here, 



ig, where x 



3 , and = —180 4 . The gradient statistic assumes the form 

n -1 X^iLi ^t' which equals the score statistic. It is easy to see that 



S 

A\ = 0, A 2 = 18, and A 3 = 20. The first three moments (up to order n^ 1 ) of S are (^[(S) = 1, 
fi2(S) = 2 + 6/n, and ^(S 1 ) = 8 + 112/n. A partial verification of our results can be accomplished 
by comparing the exact moments of S with the approximate moments given above. Since nX has a 
gamma distribution with parameters n and 1 / (n(j)), it can be shown that the first three exact moments 
of S are 1, 2 + 6/n, and 8 + 112/n + 120/n 2 , respectively. These moments differ from the approximate 
moments obtained from Corollary @]only in terms of order less than n" 1 . The Bartlett-type corrected 
gradient statistic obtained from Corollary |3]is S* = S{1 - (3 - US + 2S 2 )/(18n)}. 

Example 2. ( One-parameter exponential family) 

Let xi, . . . ,x n be a random sample of size n in which each Xi has a distribution in the one- 
parameter exponential family with density 

1 



exp {— a{(j))d{x) + v(x)}, 



where a(-), v(-), d(-), and £(•) are known functions. Also, a(-) and £(•) are assumed to have first 
three continuous derivatives, with £(■) > 0, a'((p), and (3'(4>) being different from zero for all (f> 
in the parameter space, where (3 ((f)) = i' {4>) / {i{<fi)oi' (</>)}. Here, primes denote derivatives with 
respect to 0. For instance, P' = f3'((j)) = d/3(0)/d0. It can be shown that = —a'f3', = 
-(2a" P' + a'P"), and k hh = -3a"P" - 3a'"P' - a'P'". The gradient statistic takes the form 
S = n(4>o — 0)a / (0o)(/9(0o) + d), where d = n~ l J27=i d{xi). From ©, ©, and ©, we can write 



A, 



a'P' 



A 1 



P" fAa" 0" 



P' 



a' 




a 



a: 



T 



a'P' \a! 2/3' 



We now present some special cases. 
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1. Normal (0 > 0, fi e M, x e K): 

• n known: a(0) = 1/(20), £(0) = 1/2 , d(x) = (x - fi) 2 , and v(x) = -log(2vr)/2. We 
have Ai = 0, A 2 = 36, and A 3 = 40. The first three moments of S up to order n~ l 
are ^(5) = 1, /i 2 (5) = 2(1 + 6/n), and /i 3 (5) = 8(1 + 29/n). The Bartlett-corrected 
gradient statistic is S* = S{1 - (1 - 115/3 + 25 2 /3)/(3n)}. 

• known: a(fi) = —pt/(j>, = exp(/i 2 /20), d(x) = x, and v(x) = —x 2 /2 — 
log(27T0)/2. Here, A\ = A 2 = A 3 = 0, as expected. 

2. Inverse normal (0 > 0, yU > 0, x > 0): 

• /i known: a(0) = 0, £(0) = l/4> 1/2 ,d(x) = (x-fi) 2 /(2fi 2 x),mdv(x) = - log(27rx 3 )/2. 
Here, A\ = 24, A 2 = 30, and A 3 = 10, and the three first moments of S are ^[(S) = 
1 + 2/n, /2 2 (S) = 2 + 18/n, and ^(S 1 ) = 8 + 188/n. The Bartlett-corrected gradient 
statistic takes the form S* = 5{1 - (S + 2) (5 + 3)/(18n)}. 

• known: a{fj) = 0/(2/i 2 ), £(0) = exp(— 0//i), = x, and t>(x) = — 0/(2x 2 ) + 
log(27rx 3 )/2. We have A\ — and A 2 = A 3 = 45/i/0. The first three approximate 
moments of 5 are ^(S 1 ) = I, fJ, 2 {S) = 2 + 15/V(n0), and /x 3 (5) = 8 + 27O/i/(n0). Also, 
5* = 5{l-,u5(5-5)/(4n0)}. 

3. Gamma (fc known, fc > 0, > 0, x > 0): a(0) = 0, £(0) = 0~ fe , cZ(x) = x, and v(x) — (k — 
1) logx — logr(fc), where T(-) denotes the gamma function. We have A\ = 12/ k, A 2 = 15/ k, 
A 3 = 5/k, and first three approximate moments ^[(S) = 1 + l/(nk), [i 2 (S) = 2 + 9/(nk), 
and ^(5) = 8 + 94/(nfc). Also, 5* = S{1 - (5 + 2)(S + 3)/(36nfc)}. 

4. Truncated extreme value (0 > 0, x > 0): a(0) = 1/0, £(0) = 0, d(x) = exp (2) — 1, 
and v(x) = x. We have A x = 0, A 2 = 12, A 3 = 20, fi[{S) = I, fi 2 {S) = 2 + 4/n, 
fji 3 (S) = 8 + 88/ n, and = 5{1 - (12 - 155 + 2S 2 )/(18n)}. 

5. Pareto (0 > 0, k > 0, k known, x > k): a(0) = 1 + 0, £(0) = {(pk^y 1 , and = 0. Here, 
A x = 12, A 2 = 15, A 3 = 5, fi[{S) = 1 + 1/n, /i 2 (5) = 2 + 9/n, fi 3 (S) = 8 + 94/n, and 

= S{1 - (5 + 2)(5 + 3)/(36n)}. 

6. Power (0 > 0, > 0, known, 2 > 0): a(<j>) = 1-0, £(0) = _1 0^, and u(a;) = 0. The 
A's, the first three approximate moments, and the Bartlett-type corrected statistic coincide with 
those obtained for the Pareto distribution. 

7. Laplace (0 >0,ke R, k known, x G M): a(9) = 0-\((6) = 26,d(x) = \x-k\, andv(x) = 0. 
We have A x = 0, A 2 = 18, A 3 = 20, fi[{S) = 1, // 2 (5) = 2 + 6/n, /i 3 (5) = 8 + 112/n, and 
5* = S{1 - (3 - 115 + 25 2 )/(18n)}. 
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4 Models with two orthogonal parameters 



The t wo-parameter families of distributions under orthogonality of the parameters (|Cox and Reidl 



1987|) . say <fi and f3, will be the subject of this section. The null hypothesis under test is "H : <P — <A>> 
where 0o is a fixed value, and (3 acts as a nuisance parameter. The orthogonality between <p and f3 leads 
to considerable simplification in the formulas of A\, A 2 , and A 3 . Here, K^p = E(d 3 £(9)/d/3d<fi 2 ), 
K H/3 = dK^p/d/3, etc. After some algebra, we have 

A\ = Ait/, + Ai^p, A 2 = A 2( j) + A 2( f ) p, A 3 = — 3 , (8) 

where A\$ and A^ are equal to A\ and A 2 given in © and ©, respectively, and 

3{4k^k^ + K^ppi^n^ - k^) } ^{K^pp - - 2ac^) 

/ll</>/3 - ~2 1 



3{2K00 ( g(2K^ > — + K <pl3p(2 K 'p > p ~ 3/%?/?) } 

2" ' 

3(3« <w «^ /3 + K^) 

^20/3 - ~2 • 



«4 

The expressions for and A 2 0/? in © can be regarded as the additional contribution introduced in 
the expansion of the cumulative distribution function of the gradient statistic owing to the fact that (3 
is unknown and has to be estimated from the data. In the following, we present some examples. 

Example 3. (Normal distribution) 

Let x 1 , . . . , x n be a random sample from a normal distribution N((f>, (3). The gradient statistic can 
be written in the form 

S = n- 



Ti/T 2 

where T\ = n(x — (fio) 2 and T 2 = J2™ =1 (xi — x) 2 , where x = n~ l x i- Under the null hypothesis, 
Ti//3 and T 2 /(3 are independent with distributions Xi an d Xn-i» respectively. It can be shown that 
n^S has a beta distribution with parameters 1 /2 and (n — l)/2. The first three exact moments of S 
are 1, 2(n - l)/(n + 2), and 8(n - l)(n - 2)/{(n + 2)(n + 4)}, respectively. Here, A 1 = A 3 = 
and A 2 = —18. The first three approximate moments of S are fi'(S) = 1, ^ 2 (S) = 2 — 6/n, and 
^3 (SO = 8 — 72/n. These moments differ from the approximate moments only by terms of order less 
than n" 1 . The Bartlett-type corrected gradient statistic is S* = S{1 — (3 — S)/(2n)}. 

Example 4. (Bivariate two-parameter exponential distribution) 

Let xn, . . . , xi ni and x 2 i, ■ ■ ■ , x 2ri2 be two independent random samples from exponential dis- 
tributions with means ji and (fifi, respectively. It can be shown that <\> and (3 = jicj) 1 / 2 are globally 
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orthogonal. The parameter of interest is - the ratio of the means - and the interest lies in testing 
H : — 1, which is equivalent to the equality of the two population means, against T-L a : 7^ 1. We 
consider the balanced case (ni = n 2 = n/2, n > 2 even). Let x\ and x 2 be the sample means. The 
log-likelihood function can be written as 

£(0,/3) = -log/3 



The gradient statistic for testing V, takes the form 

n{xi — x 2 ) 2 



S 



4xix 



where x — (x\ + x 2 )/2. The cumulants of log-likelihood derivatives are = — l/(40 2 ), 
3/(40 3 ), k hh = -45/(160 4 ), Kpp = -1//3 2 , n m = 4//3 3 , K m = 0, k w = 0, n m = l/(4/30 2 ), 
and n m4> = -l/(2/3 2 2 ). From d8]), we have A x = 24, A 2 = 63, and A 3 = 45. The corrected 
gradient statistic becomes S* = S{1 - (S - 1)(S - 2) /(An)}. 

Example 5. (Two-parameter Birnbaum-Saunders distribution) 



The two-parameter Birnbaum-Saunders distribution was proposed by 



Birnbaum and Saunders 



(| 1969b and has cumulative distribution function in the form G(x) = <3>(t> ), with x > 0, where 
v = (p' 1 p(x j (3) , p(z) = z 1 ! 2 — z~ 1 / 2 , and $(•) is the standard normal cumulative distribution func- 
tion; > and (3 > are the shape and scale parameters, respectively. We wish to test H : — 0o 
against the alternative hypothesis H a : 7^ O , where O is a known positive constant. The gradient 
statistic to test T-Lq is 

ff = 2 (£-M {I+f _ (2+ ^ }i 

where s = (n/3) _1 XlILi x *' ^ = SiLi X 7 X > an( ^ ^ * s me max i mum likelihood estimator of (3 
obtained under Ho- We have = — 2/0 2 , = 0, and Kpp = —{1 + 0(27r) _1 / 2 ft,(0)}/(0 2 /3 2 ), 
where fo(0) = 0(tt/2) 1/2 - 7re 2/</>2 {l - $(2/0)}. After some algebra, we obtain A 1(f) = -3, A 2< £ = 
69/8, A 24>p = -45(2 + 2 )/[2{l + 0(2tt)- 1 / 2 /i(0)}], A 3 = 125/8, and 

9 - 150 2 /2 3(0 2 + 2) f 2(4 + 2 )fr(0) ) 

10/3 1 + 0(270-1/2^(0) 2{l + 0(27r)-V2/ i (0) } 2\ ^ + <?M + ^ j- 



Since the necessary quantities to obtain the A's were derived, a Bartlett-corrected gradient statistic 
may be obtained from Corollary |2] It is interesting to note that the A's do not depend on the unknown 
scalar parameter (3. Next, we shall present a small Monte Carlo simulation regarding the test of the 
null hypothesis Ji : = 1. 

The simulations were performed by setting (3 = 1 and sample sizes ranging from 5 to 22 ob- 
servations. All results are based on 10,000 replications. The size distortions (i.e. estimated minus 
nominal sizes) for the 5% nominal level of the gradient statistic and its Bartlett-corrected version for 
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different sample sizes are plotted in Figure [TJ a). It is clear from this figure that the Bartlett-corrected 
test displays smaller size distortions than the original gradient test. 

Finally, we set n = 10 and consider the first-order approximation (xi distribution) for the dis- 
tribution of the gradient statistic and the expansion obtained in this paper. Figure [Tib) presents the 
curves. The difference between the curves is evident from this figure, and hence, the x\ distribution 
may not be a good approximation for the null distribution of the gradient statistic in testing the null 
hypothesis H : <p = 1 for the two-parameter Birnbaum-Saunders model if the sample is small. 




Figure 1: (a) Size distortion of the gradient test (solid) and the Bartlett-corrected gradient test 
(dashes); (b) first-order approximation (solid) and expansion to order n~ l (dashes) of the null cu- 
mulative distribution function of the gradient statistic. 



5 Discussion 



Lemonte and Ferrari 



(I2012a|) showed that the gradient test can be an interesting alternative to the 
classic large-sample tests, namely the likelihood ratio, the Wald, and the Rao score tests, since none 
is uniformly superior to the others in terms of second-order local power. Additionally, as remarked 
before, the gradient statistic does not require to obtain, estimate, or invert an information matrix, 
unlike the Wald and the Rao score statistics. Its formal simplicity is always an attraction. 

The exact null distribution of the gradient statistic is usually unknown and the test relies upon an 
asymptotic approximation. The chi-square distribution is used as a large-sample approximation to 
the true null distribution of this statistic. However, for small sample sizes, the chi-square distribution 
may be a poor approximation to the true null distribution; that is, the asymptotic approximation may 
deliver inaccurate inference. In order to overcome this shortcoming, an alternative strategy is to use a 
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higher-order asymptotic theory. 

The asymptotic expansion up to order rr l for the null distribution function of the gradient statistic 
was derived in this paper. A Bay esian route based on the shrinkage argument (|Ghosh and Mukerjeei 



1991 



Mukerjee and ReidL l2Q00h proved to be extremely useful in this context. The expansion is very 



general in the sense that the null hypothesis can be composite in the presence of nuisance parameters. 
We show that the coefficients which define this expansion depend on the joint cumulants of log like- 
lihood derivatives for the full data. Unfortunately, these coefficients are very difficult to interpret in 
ge nerality. 



Cordeiro and Ferrari 



(|1991|) showed that, quite generally, continuous statistics having a chi-square 
distribution asymptotically can be modified by a suitable correction term that makes the modified 
statistic have chi-square distribution to order n" 1 . Their work ca n be viewed as an extension of 
Bartlett corrections to the likelihood ratio statistic (|LawleyL Il95q) to other statistics having a chi- 
square distribution asymptotically. The correction term comes from the coefficients of the 0(n~ r ) 
term in the expansion of the cumulative distribution function of the test statistic in such a way that 
it becomes better approximated by the reference chi-square distribution. It is known as the Bartlett- 
type correction. It is well known that Bartlett and Bartlett-type corrections have become a widely 
used method for improving the large-sample chi-square approximation to the null distribution of the 
likelihood ratio and Rao score statistics, respectively. In recent years there has been a renewed interest 
in Bartlett factors and several papers have been p ublished giving exp r essions for computing thes e 
corrections for special models. Some references are 



Tu et al. 



(2005). 



Zucker et al 



van Giersbergen I (12009I) . Bail (120091) . Lagos et al. 



(2000), 



Lagos and MoretfinJ (|2004|), 



d201C ), and 



Cordeiro and Ferrari 



(11991b . 



From the general expansion derived in this paper and using results in ; 
we also obtained a Bartlett-type correction factor for the gradient statistic. Our results are very general 
and not tied to special classes of models. They allow the parameter vector to be multidimensional and 
are valid regardless of whether nuisance parameters are pre sent or not. Additionally, as the coefficients 
in the expansion, and consequently in the Bartlett-type correction factor, are written as functions of 
cumulants of log-likelihood derivatives, they can be obtained for all the classes of parametric models 
for which those cumulants can be determined. Therefore, applications of our general results in several 
parametric models, such as the generalised linear models and extensions, can be studied in future 
research. 
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Appendix 1 



Proof of Theorem 3] 

Except when indicated, the indices j, r, s, u, v, and w range over 1 to p and the indices f, r', s', v! , v', and 
w' range over 1 to q. Also, an array index repeated as both a superscript and a subscript indicates an implied 
summation over the appropriate range. Let Xj r = —ipj r = —{DjD r £(6)} g= g, tpj rs = {DjD r D s £(6)} g= g, 
ipjrsu = {DjD r D s D u £(O)} 0= Q, etc. The matrix A = ((A jr )) is the observed information matrix evaluated at 
6. The partition of = (Oj ,0j) T induces the partition 



A = ((A jr )) 



An A12 
A21 A22 



((A*)) 



A 11 A 12 
A 21 A 22 



Let A 11 1 = ((Ai wY )), crJ' r = \i r - A jW \ lw >f\ j ' r , t«" = \ jw '\i w >f, 
•yr's'u' = A jV A sV [3], and xf 
mation with the number in brackets indicating the number of terms obtained by permutation of indices. For 
instance, a su a vw [3] = a su a vw + a sv a uw + a sw a uv . Let e = (ei, . . . , e q ) T = n 1 / 2 (6 1 - Q{), = 



a {1) 

v suvw 



a su a vw [3], \f r , s , ul = A^' V A sV [3], and A$ VuW = \i' r ' \ s ' u ' X"' W '[15], where [•] denotes a sum- 



l/> jr ,<T r 'T»' /2, = il jrs T"'T rr, T SS, /6, 



' 3 r 



i ' (vxv>/ a tin' 



^j'r's'u' — 24 {^7 rslt ~^ ^ {^^jrs^uvw ~\~ 3lpj rv l/j S uw)} T T T 

Lemma 1. An asymptotic expansion under the null hypothesis for the gradient statistic CO is 



Proof. Using a procedure analogous to that of IChang and Mukerjeel (|201ll) . the result holds. 



(9) 
□ 



Let 7r = n(0) be a prior density for 6, nj = Djir(6), irj r = DjD r ir(6), tt = tt(0), ttj = 7Tj(6), 

Tfj r = 7Tj r (0), 



j'r' 



2ft 4%> s «°" 



(1) I rr' 
ri>ui f " suvw f 1 1 



(i) _ flvW ^ 



r = * + ^r- 



7T 



(4) T (4) TTm , 



yr's'u' yr's'u' 1 g^T'J 7 * 



From 



Ghosh and Mukerjeel (119911) . IChang and Mukerjeel (120 101) derive an expansion up to order n 1 for the 



marginal posterior density of e, which takes the form 



ir pos t(e) = 4> q (e;A 



1 



1 + ^( r f )e f+ F fl^j'^ s/ ) 



(3) 



+ ^{ r jv( e i' e r' - A jV ) + ^^(e^ve^eu/ - A$, sV ) 



(10) 



+ o(n x ), 
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where (j) q (z; X) denotes the density of the q-variate normal distribution with m ean and covariance matrix X!. 

We now follow the Bayesian route described in lMukerjee and Reidl (120001) : see Appendix 2. 
Step 1. The approximate posterior characteristic function of S is 

M w (t) = E n {exp(£S)} = J exp(ZS)ir post (e)de, 
where £ = it with i = (— l) 1 / 2 . From LemmaHJand after some algebra, we can write 

ex P (£S> post (e) = (1 - 20~ q/ % U ^-^\ 1 + - S^w^ey + r^ejj 



fl4^ {4) +* (3) f3r (1) -4^ (1) U -r {4) 

£\ j'r's'u' ^ j'r's' \ u' u' IS L j'r's'u' 



efe r >e s >e u i 



" r 1 j'V V fc i fcr A ) L j'r's'u' A j'r's'u' 



fV^ r( 4 ) 



+ o p (n x ). 



Now, by writing £ = -|(1 - 2£) + \, £ 2 = ±(1 - 2£) 2 - ±(1 - 2£) + |, and assuming that 6 is in the interior 
of the support of it, we obtain after some algebra 



M w (t) = (1 - 20~ q/2 |l + i £>(1 - 2£r I + o P (n 



(11) 



where H = - (H x + H 2 + H 3 ), 



n\ — ji r i s i^ u i v i w i^ji r i s i u i v i w i 

+ a (1) /2^ (4) -vi/® v&^h + i**&( 3 ) r (1) 

A j'r's'u' \ ^{^ j'r's'u' *jW*ii' ' 2 j' r ' s u ' 



if =--^ (3) ^ (3) A {2) 

2 ^ j'r's' u'v'w' j 'r's'u'v'w ' 



+ a (1) /r (4) -2fM/ {4) -^ (3) - V 3) r (1) l 

j'r' s'u' | j'r's'u' j'r's'u' ^j'r's'^u') 2 f r ' s u ' f ' 



rr = I^( 3 ) $( 3 ) \ {2) 

n j'r's' u'v'w' j'r's'u'v'w' ' 



Step 2. Let 7f (•) be an auxiliary prior density for satisfying the conditions in lBickel and Ghoshl (I1990T) . We 
now obtain an approximate posterior characteristic function of S under the prior 7f (•), say M^(t). From (fTTI) . 
we have 

M*(t) = (1 - 2e)~"/ 2 I 1 + i £ ^(1 - 2£)" 1 | + o^n" 1 ), 



i=0 



where i/j denotes the counterpart of Hi obtained by replacing tt(-) with tt(-). After some algebra, we have 
A(0) = Eo(AU) = (1 
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where J = —{J\ + J 2 + h), 

Ji = ^K jrs K uvw (9m jr m su m vw + Qm ju m rv m sw ) + -(K jrsu + 3K jrv K suw a vw )m jr m su 

3 3 TT U 

~\~ Ki j f s f^uvw Ck TH? Tfl ~\~ f\>j>pgTH? Tfl _ 
8 4 7T 

-{- ^ ^ ~^^jrsu^ ~^(2Kj rs K uvw -\- 3tZj su K rvw Cl CL ) 

7T 7T J 

J 2 = -^K jrs K uvw (9m jr m su m vw + 6m ju m rv m sw ) - -(Kj rsu + 3K j ™K snio a^)m ir m su 
3 ■ 3 ■ 7f M 

Kn 

8 4 7T 



+ 3m^m s " 



Step 3. We now compute 



1 1 7T 

— — \Kj' rsu + (2Kj rs K uvw + ^Kj rv K suw ) CL } -|- -Ki>s~ 

24 v 7 6 7T 

J 3 = ^K J r S K«™(9m 3r m s "m™ + 6m^m™m OT ). 



y A(0)#(0)d0 = (1 - 20~ q/2 |l + ^ X^ 1 - 2 _< / j + 

by integrating the J's with respect to n. After integrating each term that depends on the prior distributions and 
by allowing 7f(-) to converge weakly to the degenerate prior at the true value of 0, we arrive at 

£ {exp(£S)} = (1 - 20~ q/2 |l + n- 1 £ A^l - 2£)-< | + o^" 1 ), 

where the A's are functions of cumulants of log-likelihood derivatives. By writing d = 2£/(l — 2£) and using 
the fact that Ya=q A~i = 0, we arrive at 

M(t) = (1 - 20" 9/2 |l + + Md 2 + A 3 (i 3 )} + o(n~ l ), (12) 

with Ai = 24(^i + 2A 2 + 3A 3 ), A 2 = 24(A 2 + 3A 3 ), and A 3 = 24A 3 . We can write 

Ai = 12DjD r m jr - QD u (K jrs m jr m su ) - 12D u (K jrs m jr a su ) - l2D r (K jsu m jr a su ) 

(m jr m su a vw + 2m ir a™a™) + 

A 2 = 6 J D u (K irs m J> m su ) - 3/c irs K u ™ ^m J 'Wa™ + ^m ir ro™m TO + im^WV^ 

o /Sj j T su 77L 77L 

tv^suvjTCI Tfl CI , 

^ 3 = ^K jrs K uvw (9m jr m su m vw + 6m ju m rv m sw ). 

Inverting M(t) in (fl~2l) and interchanging the indices in a suitable manner, after some algebra, we arrive at the 
expression for A\, A 2 , and A3 as given in Theorem Q] 
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Appendix 2 



The Shrinkage Argument 

Let x = (xi, . . . , x n ) T be a random vector with density /(-, 8), where 6 G is a p-dimensional parameter 
and © C W is an open subset of the Euclidean space. Let Q(-,6) be a measurable function. Assume that Q 
is continuous for all and that its expectation exists. A Bayesian route for obtaining Eq{Q(-, 6)} based on a 
shrinkage argument involves the three steps described below. 

Step 1. Obtain E 7T {Q(6, X)\ X = x}, the posterior expectation of Q under the prior tt(-) for 0. 

Step 2. Find Eq[E 7T {Q(6 ', X)\X = x}] = A(0), for G int s (ir), where int s (Tr) denotes the interior of the 
support of 7T. 

Step 3. Integrate A(G) with respect to tt(-) and allow tt(-) to converge weakly to the degenerate prior at 0, 
where 6 G int s (ir). This yields E {Q(X, 0)}. 



A detailed justification can be found in lMukerjee and Reidl (120001) . 
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